Crate email_address

source ·
Expand description

A Rust crate providing an implementation of an RFC-compliant EmailAddress newtype.

Primarily for validation, the EmailAddress type is constructed with FromStr::from_str which will raise any parsing errors. Prior to constructions the functions is_valid, is_valid_local_part, and is_valid_domain may also be used to test for validity without constructing an instance. This supports all of the RFC ASCII and UTF-8 character set rules, quoted and unquoted local parts but does not yet support all of the productions required for SMTP headers; folding whitespace, comments, etc.

"Simon Johnston <johnstonsk@gmail.com>"
                 ^------------------^ email()
                            ^-------^ domain()
                 ^--------^ local_part()
 ^------------^ display_part()

§Example

The following shoes the basic is_valid and from_str functions.

use email_address::*;
use std::str::FromStr;

assert!(EmailAddress::is_valid("user.name+tag+sorting@example.com"));

assert_eq!(
    EmailAddress::from_str("Abc.example.com"),
    Error::MissingSeparator.into()
);

The following shows the three format functions used to output an email address.

use email_address::*;
use std::str::FromStr;

let email = EmailAddress::from_str("johnstonsk@gmail.com").unwrap();

assert_eq!(
    email.to_string(),
    "johnstonsk@gmail.com".to_string()
);

assert_eq!(
    String::from(email.clone()),
    "johnstonsk@gmail.com".to_string()
);

assert_eq!(
    email.as_ref(),
    "johnstonsk@gmail.com"
);

assert_eq!(
    email.to_uri(),
    "mailto:johnstonsk@gmail.com".to_string()
);

assert_eq!(
    email.to_display("Simon Johnston"),
    "Simon Johnston <johnstonsk@gmail.com>".to_string()
);

§Specifications

  1. RFC 1123: Requirements for Internet Hosts – Application and Support, IETF,Oct 1989.
  2. RFC 3629: UTF-8, a transformation format of ISO 10646, IETF, Nov 2003.
  3. RFC 3696: Application Techniques for Checking and Transformation of Names, IETF, Feb 2004.
  4. RFC 4291 IP Version 6 Addressing Architecture, IETF, Feb 2006.
  5. RFC 5234: Augmented BNF for Syntax Specifications: ABNF, IETF, Jan 2008.
  6. RFC 5321: Simple Mail Transfer Protocol, IETF, Oct 2008.
  7. RFC 5322: Internet Message Format, I ETF, Oct 2008.
  8. RFC 5890: Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework, IETF, Aug 2010.
  9. RFC 6531: SMTP Extension for Internationalized Email, IETF, Feb 2012
  10. RFC 6532: Internationalized Email Headers, IETF, Feb 2012.

From RFC 5322: §3.2.1. Quoted characters:

quoted-pair     =   ("\" (VCHAR / WSP)) / obs-qp

From RFC 5322: §3.2.2. Folding White Space and Comments:

FWS             =   ([*WSP CRLF] 1*WSP) /  obs-FWS
                                       ; Folding white space

ctext           =   %d33-39 /          ; Printable US-ASCII
                    %d42-91 /          ;  characters not including
                    %d93-126 /         ;  "(", ")", or "\"
                    obs-ctext

ccontent        =   ctext / quoted-pair / comment

comment         =   "(" *([FWS] ccontent) [FWS] ")"

CFWS            =   (1*([FWS] comment) [FWS]) / FWS

From RFC 5322: §3.2.3. Atom:

atext           =   ALPHA / DIGIT /    ; Printable US-ASCII
                    "!" / "#" /        ;  characters not including
                    "$" / "%" /        ;  specials.  Used for atoms.
                    "&" / "'" /
                    "*" / "+" /
                    "-" / "/" /
                    "=" / "?" /
                    "^" / "_" /
                    "`" / "{" /
                    "|" / "}" /
                    "~"

atom            =   [CFWS] 1*atext [CFWS]

dot-atom-text   =   1*atext *("." 1*atext)

dot-atom        =   [CFWS] dot-atom-text [CFWS]

specials        =   "(" / ")" /        ; Special characters that do
                    "<" / ">" /        ;  not appear in atext
                    "[" / "]" /
                    ":" / ";" /
                    "@" / "\" /
                    "," / "." /
                    DQUOTE

From RFC 5322: §3.2.4. Quoted Strings:

qtext           =   %d33 /             ; Printable US-ASCII
                    %d35-91 /          ;  characters not including
                    %d93-126 /         ;  "\" or the quote character
                    obs-qtext

qcontent        =   qtext / quoted-pair

quoted-string   =   [CFWS]
                    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                    [CFWS]

From RFC 5322, §3.4.1. Addr-Spec Specification:

addr-spec       =   local-part "@" domain

local-part      =   dot-atom / quoted-string / obs-local-part

domain          =   dot-atom / domain-literal / obs-domain

domain-literal  =   [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]

dtext           =   %d33-90 /          ; Printable US-ASCII
                    %d94-126 /         ;  characters not including
                    obs-dtext          ;  "[", "]", or "\"

RFC 3696, §3. Restrictions on email addresses describes in detail the quoting of characters in an address.

§Unicode

RFC 6531, §3.3. Extended Mailbox Address Syntax extends the rules above for non-ASCII character sets.

sub-domain   =/  U-label
    ; extend the definition of sub-domain in RFC 5321, Section 4.1.2

atext   =/  UTF8-non-ascii
    ; extend the implicit definition of atext in
    ; RFC 5321, Section 4.1.2, which ultimately points to
    ; the actual definition in RFC 5322, Section 3.2.3

qtextSMTP  =/ UTF8-non-ascii
    ; extend the definition of qtextSMTP in RFC 5321, Section 4.1.2

esmtp-value  =/ UTF8-non-ascii
    ; extend the definition of esmtp-value in RFC 5321, Section 4.1.2

A “U-label” is an IDNA-valid string of Unicode characters, in Normalization Form C (NFC) and including at least one non-ASCII character, expressed in a standard Unicode Encoding Form (such as UTF-8). It is also subject to the constraints about permitted characters that are specified in Section 4.2 of the Protocol document and the rules in the Sections 2 and 3 of the Tables document, the Bidi constraints in that document if it contains any character from scripts that are written right to left, and the symmetry constraint described immediately below. Conversions between U-labels and A-labels are performed according to the “Punycode” specification RFC3492, adding or removing the ACE prefix as needed.

RFC 6532: §3.1 UTF-8 Syntax and Normalization, and §3.2 Syntax Extensions to RFC 5322 extend the syntax above with:

UTF8-non-ascii  =   UTF8-2 / UTF8-3 / UTF8-4

...

VCHAR   =/  UTF8-non-ascii

ctext   =/  UTF8-non-ascii

atext   =/  UTF8-non-ascii

qtext   =/  UTF8-non-ascii

text    =/  UTF8-non-ascii
              ; note that this upgrades the body to UTF-8

dtext   =/  UTF8-non-ascii

These in turn refer to RFC 6529 §4. Syntax of UTF-8 Byte Sequences:

A UTF-8 string is a sequence of octets representing a sequence of UCS characters. An octet sequence is valid UTF-8 only if it matches the following syntax, which is derived from the rules for encoding UTF-8 and is expressed in the ABNF of [RFC2234].

   UTF8-octets = *( UTF8-char )
   UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
   UTF8-1      = %x00-7F
   UTF8-2      = %xC2-DF UTF8-tail
   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
   UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( UTF8-tail ) /
                 %xF4 %x80-8F 2( UTF8-tail )
   UTF8-tail   = %x80-BF

Comments in addresses are discussed in RFC 5322 Appendix A.5. White Space, Comments, and Other Oddities.

An informal description can be found on Wikipedia.

Structs§

  • Type representing a single email address. This is basically a wrapper around a String, the email address is parsed for correctness with FromStr::from_str, which is the only want to create an instance. The various components of the email are not parsed out to be accessible independently.
  • Struct of options that can be configured when parsing with parse_with_options.

Enums§

  • Error type used when parsing an address.